21 research outputs found

    Statistical Learning Algorithm for Tree Similarity

    Full text link
    Tree edit distance is one of the most frequently used dis-tance measures for comparing trees. When using the tree edit distance, we need to determine the cost of each oper-ation, but this is a labor-intensive and highly skilled task. This paper proposes an algorithm for learning the costs of tree edit operations from training data consisting of pairs of similar trees. To formalize the cost learning problem, we define a probabilistic model for tree alignment that is a variant of tree edit distance. Then, the parameters of the model are estimated using the expectation maximization (EM) technique. In this paper, we develop an algorithm for parameter learning that is polynomial in time (O(mn2d6)) and space (O(n2d4)) where n, d, and m represent the size of the trees, the maximum degree of trees, and the number of training pairs of trees, respectively. 1

    A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures

    Get PDF
    [Background]Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees. [Results]In this paper, we present a practical method for computing the edit distance between rooted unordered trees. In this method, the edit distance problem for unordered trees is transformed into the maximum clique problem and then efficient solvers for the maximum clique problem are applied. We applied the proposed method to similar structure search for glycan structures. The result suggests that our proposed method can efficiently compute the edit distance for moderate size unordered trees. It also suggests that the proposed method has the accuracy comparative to those by the edit distance for ordered trees and by an existing method for glycan search. [Conclusions]The proposed method is simple but useful for computation of the edit distance between unordered trees. The object code is available upon request

    Modeling topical trends over continuous time with priors

    Get PDF
    In this paper, we propose a new method for topical trend analysis. We model topical trends by per-topic Beta distributions as in Topics over Time (TOT), proposed as an extension of latent Dirichlet allocation (LDA). However, TOT is likely to overfit to timestamp data in extracting latent topics. Therefore, we apply prior distributions to Beta distributions in TOT. Since Beta distribution has no conjugate prior, we devise a trick, where we set one among the two parameters of each per-topic Beta distribution to one based on a Bernoulli trial and apply Gamma distribution as a conjugate prior. Consequently, we can marginalize out the parameters of Beta distributions and thus treat timestamp data in a Bayesian fashion. In the evaluation experiment, we compare our method with LDA and TOT in link detection task on TDT4 dataset. We use word predictive probabilities as term weights and estimate document similarities by using those weights in a TFIDF-like scheme. The results show that our method achieves a moderate fitting to timestamp data.Advances in Neural Networks - ISNN 2010 : 7th International Symposium on Neural Networks, ISNN 2010, Shanghai, China, June 6-9, 2010, Proceedings, Part IIThe original publication is available at www.springerlink.co

    Dynamic hyperparameter optimization for bayesian topical trend analysis

    Get PDF
    This paper presents a new Bayesian topical trend analysis. We regard the parameters of topic Dirichlet priors in latent Dirichlet allocation as a function of document timestamps and optimize the parameters by a gradient-based algorithm. Since our method gives similar hyperparameters to the documents having similar timestamps, topic assignment in collapsed Gibbs sampling is affected by timestamp similarities. We compute TFIDF-based document similarities by using a result of collapsed Gibbs sampling and evaluate our proposal by link detection task of Topic Detection and Tracking.Proceeding of the 18th ACM conference : Hong Kong, China, 2009.11.02-2009.11.0

    バイオインフォマティクス ニ オケル コウゾウ データ ニ タイスル リサン サイテキカ アルゴリズム

    No full text
    京都大学0048新制・課程博士博士(情報学)甲第12437号情博第191号新制||情||43(附属図書館)24273UT51-2006-J428京都大学大学院情報学研究科知能情報学専攻(主査)教授 阿久津 達也, 教授 岡部 寿男, 教授 永持 仁学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDA

    Approximating Tree Edit Distance through String Edit Distance

    Get PDF
    We present an algorithm to approximate edit distance between two ordered and rooted trees of bounded degree. In this algorithm, each input tree is transformed into a string by computing the Euler string, where labels of some edges in the input trees are modified so that structures of small subtrees are reflected to the labels. We show that the edit distance between trees is at least 1/6 and at most O(n 3/4) of the edit distance between the transformed strings, where n is the maximum size of two input trees and we assume unit cost edit operations for both trees and strings. The algorithm works in O(n 2) time since computation of edit distance and reconstruction of tree mapping from string alignment takes O(n 2) time though transformation itself can be done in O(n) time

    Inferring a graph from path frequency

    Get PDF
    This paper considers the problem of inferring a graph from the number of occurrences of vertex-labeled paths, which is closely related to the pre-image problem for graphs: to reconstruct a graph from its feature space representation. It is shown that both exact and approximate versions of the problem can be solved in polynomial time in the size of an output graph by using dynamic programming algorithms if the graphs are trees whose maximum degree is bounded by a constant and the lengths of given paths and alphabet size are bounded by constants. On the other hand, it is shown that this problem is strongly NP-hard even for trees of bounded degree if the maximum length of paths is not bounded. The problem of inferring a string from the number of occurrences of fixed size substrings is also studied

    Exact algorithms for computing the tree edit distance between unordered trees

    Get PDF
    This paper presents a fixed-parameter algorithm for the tree edit distance problem for unordered trees under the unit cost model that works in O(2.62^k⋅poly(n)) time and O(n^2) space, where the parameter k is the maximum bound of the edit distance and n is the maximum size of input trees. This paper also presents polynomial-time algorithms for the case where the maximum degree of the largest common subtree is bounded by a constan

    AI活動による古典研究の可能性

    No full text

    Approximation and parameterized algorithms for common subtrees and edit distance between unordered trees

    Get PDF
    Given two rooted, labeled, unordered trees, the common subtree problem is to find a bijective matching between subsets of nodes of the trees of maximum cardinality which preserves labels and ancestry relationship. The tree edit distance problem is to determine the least cost sequence of insertions, deletions and substitutions that converts a tree into another given tree. Both problems are known to be hard to approximate within some constant factor in general. We tackle these problems from two perspectives: giving exact algorithms, either for special cases or in terms of some parameters; and approximation algorithms and hardness of approximation. We present a parameterized algorithm in terms of the number of branching nodes that solves both problems and yields polynomial algorithms for several special classes of trees. This is complemented with a tighter APX-hardness proof that holds when the trees are of height one and two, respectively. Furthermore, we present the first approximation algorithms for both problems. In particular, for the common subtree problem for t trees, we present an algorithm achieving a tlog2(bOPT+1) ratio, where bOPT is the number of branching nodes in the optimal solution. We also present constant factor approximation algorithms for both problems in the case of bounded height trees
    corecore